projpredSEM

Projection predictive variable selection for Bayesian regularized SEM

Sara van Erp

Utrecht University

Goal of projpredSEM


Context: Regularized SEM, i.e., models with many parameters and a penalty function (frequentist) or shrinkage prior (Bayesian).


Goal: Providing a more formal approach to select parameters (and thus models) in Bayesian regularized SEM.

Regularization in MIMIC models


MIMIC model drawn with https://semdiag.psychstat.org

Bayesian regularized SEM

A shrinkage prior takes the role of the penalty:

\[ posterior \propto likelihood \times prior \]

Ideal shrinkage prior:

  1. Peaked around zero
  2. Heavy tails

Many different shrinkage priors exist (see e.g., Van Erp, Oberski, and Mulder (2019)) and several have been applied in SEM (see Van Erp (2023) for an overview). Here, we use the regularized horseshoe prior (Piironen and Vehtari (2017a)).

Why is projpredSEM needed?


In Bayesian regularized SEM, parameters are not automatically set to zero.

  • Van Erp, Oberski, and Mulder (2019) showed different conditions require different CIs in linear regression models
  • Zhang, Pan, and Ip (2021) showed different conditions require different (arbitrary) types of criteria in SEM
  • As the dimensionality grows, parameters get pulled more heavily to zero and marginal criteria can fail

Projection predictive variable selection: General approach


Goal: Finding a smaller submodel that predicts practically as good as the larger reference model.

  1. Specify a reference model
  2. Project the posterior information of the reference model onto the candidate models
  3. Select the candidate model with the best predictive performance

See e.g., Piironen and Vehtari (2017b), Pavone et al. (2020), Piironen, Paasiniemi, and Vehtari (2020), or McLatchie et al. (2023)

Projection predictive variable selection: Technical details (1)


Idea behind posterior projection: replace the reference model posterior \(p(\theta_*|D)\) with a simpler, restricted model \(q_\perp (\theta)\).


Implementation: minimize the KL divergence between the induced predictive distributions:

\[ \theta_\perp = \text{arg min KL }(p(\tilde{y }|\theta_*) || p(\tilde{y }|\theta)) \]

Projection predictive variable selection: Technical details (2)


Practical implementation: instead of numerical optimization, for Gaussian linear models an analytical solution is available (Piironen, Paasiniemi, and Vehtari (2020)):

\[ \beta_{proj} = (X^T X)^{-1} X^T \mu_* \] where \(\mu_*\) denotes the predictions based on the reference model and \(X\) the design matrix of the restricted model.

Projection predictive variable selection: Technical details (3)


  • Forward search to find candidate models
  • Evaluate predictive performance based on projected (clustered) posterior draws
  • Select the model with predictive performance closest to the reference model.


Available in the R-package projpred (Piironen et al. (2023))

Projection predictive variable selection for the MIMIC model

library(lavaan)
library(brms)
library(projpred)

mod <- 'F =~ y1 + y2 + y3 + y4 + y5
        F ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10'

fit.lavaan <- sem(mod, data = df)
fs <- lavPredict(fit.lavaan, method = "Bartlett")
df$fs <- as.vector(fs)

refm_fit <- brm(fs ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10,
                data = df,
                prior = prior_hs)
refm_obj <- get_refmodel(refm_fit)

cvvs <- cv_varsel(
  refm_obj,
  cv_method = "kfold",
  K = 10
)
plot(cvvs)

Example results: 10 covariates, with only x5 and x7 being relevant

Preliminary simulation results

Important caveat: these results are based on < 100 replications, without cross-validating the search part.


Variable # of levels Values
\(N_{train}\) 1 150
\(p_{tot}\) 4 % of \(N_{train}\) = c(10%, 50%, 100%, 150%)
\(p_{rel}\) 3 \(p_{rel}\)/\(p_{tot}\) = c(1/3, 1/10, 1/100)
\(\rho_{AR}\) 2 c(0.25, 0.75)
\(\beta\) 1 1/3 small, 1/3 medium, 1/3 large

Sensitivity and false discovery rates (excluding small effects)

Predictive performance

Discussion


  • Important caveat: these results are based on < 100 replications, without cross-validating the search part.
  • The suggest_size heuristic in projpred can be influential.
  • projpred seems to perform well especially in very sparse settings.
  • In low-dimensional settings, marginal threshold criteria might suffice.
  • Further simulation studies into high-dimensional settings are ongoing.

Future directions


  1. Application to real-world data: ~160 neural predictors of cognitive performance (based on Michel, McCormick, and Kievit (2024))
    • Other applications (e.g., genomicSEM) might benefit as well
  2. Extension to other SEMs
    • Other models that might benefit from this approach?
    • Might require a novel implementation of the algorithm

Questions/ideas?


Feel free to reach out during this conference, or via e-mail: s.j.vanerp@uu.nl.


These slides are available online at: https://github.com/sara-vanerp/presentations

References

McLatchie, Yann, Sölvi Rögnvaldsson, Frank Weber, and Aki Vehtari. 2023. “Robust and Efficient Projection Predictive Inference.” arXiv. http://arxiv.org/abs/2306.15581.
Michel, Lea C., Ethan M. McCormick, and Rogier A. Kievit. 2024. “Grey and White Matter Metrics Demonstrate Distinct and Complementary Prediction of Differences in Cognitive Performance in Children: Findings from ABCD (n= 11 876).” The Journal of Neuroscience, February, e0465232023. https://doi.org/10.1523/jneurosci.0465-23.2023.
Pavone, Federico, Juho Piironen, Paul-Christian Bürkner, and Aki Vehtari. 2020. “Using Reference Models in Variable Selection.” arXiv. http://arxiv.org/abs/2004.13118.
Piironen, Juho, Markus Paasiniemi, Alejandro Catalina, Frank Weber, and Aki Vehtari. 2023. projpred: Projection Predictive Feature Selection.” https://mc-stan.org/projpred/.
Piironen, Juho, Markus Paasiniemi, and Aki Vehtari. 2020. “Projective Inference in High-Dimensional Problems: Prediction and Feature Selection.” Electronic Journal of Statistics 14 (1). https://doi.org/10.1214/20-EJS1711.
Piironen, Juho, and Aki Vehtari. 2017a. “Sparsity Information and Regularization in the Horseshoe and Other Shrinkage Priors.” Electronic Journal of Statistics 11 (2). https://doi.org/10.1214/17-EJS1337SI.
———. 2017b. “Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing 27 (3): 711–35. https://doi.org/10.1007/s11222-016-9649-y.
Van Erp, Sara. 2023. “Bayesian Regularized SEM: Current Capabilities and Constraints.” Psych 5 (3): 814–35. https://doi.org/10.3390/psych5030054.
Van Erp, Sara, Daniel L. Oberski, and Joris Mulder. 2019. “Shrinkage Priors for Bayesian Penalized Regression.” Journal of Mathematical Psychology 89 (April): 31–50. https://doi.org/10.1016/j.jmp.2018.12.004.
Zhang, Lijin, Junhao Pan, and Edward Haksing Ip. 2021. “Criteria for Parameter Identification in Bayesian Lasso Methods for Covariance Analysis: Comparing Rules for Thresholding, p -Value, and Credible Interval.” Structural Equation Modeling: A Multidisciplinary Journal 28 (6): 941–50. https://doi.org/10.1080/10705511.2021.1945456.